Emanuele Iaccarino personal mark Emanuele Iaccarino personal mark E. Iaccarino
  • home
  • about
  • academic
  • projects
  • publications
  • awards & certifications
  • CV
  • get in touch!
    • LinkedIn
    • email

SQL Parser Data Pipeline

Data Engineering
SQL
Python
Library

A Python library for parsing and interpreting complex SQL queries, designed for BigQuery workflows and adaptable to other SQL dialects.

Published

April 1, 2024

This project introduces SQLParserDataPipeline, a Python package for parsing and interpreting complex SQL queries.
It was designed with BigQuery in mind but is flexible enough to adapt to other SQL dialects thanks to a parsing strategy focused on the inner query structure rather than specific SQL functions.

Core capabilities
- Select Clause Parsing: handles nested queries, functions, and placeholders beyond the scope of standard parsers
- From Clause Analysis: extracts tables and aliases in medium-complexity queries
- Unnest Transformations: identifies join types, aliases, and unique values, crucial for data pipeline design

Key strengths
- Outperforms standard SQL parsers on queries with nested SELECTs and functions
- Enables clearer lineage extraction and debugging for ETL workflows
- Provides transparent parsing results, supporting modular pipeline development

View project on GitHub


Parsing and interpreting complex SQL queries for data pipelines

Parsing complex SQL queries to support ETL pipelines and lineage tracking

© 2025 Emanuele Iaccarino

 

This website is built with , , and Quarto